model generation
- Asia > Japan (0.05)
- North America > United States > Virginia (0.04)
- North America > United States > Hawaii (0.04)
- (5 more...)
- Government (1.00)
- Education (0.93)
- Law (0.69)
Mechanistic Interpretability of Antibody Language Models Using SAEs
Haque, Rebonto, Turnbull, Oliver M., Parsan, Anisha, Parsan, Nithin, Yang, John J., Deane, Charlotte M.
Sparse autoencoders (SAEs) are a mechanistic interpretability technique that have been used to provide insight into learned concepts within large protein language models. Here, we employ TopK and Ordered SAEs to investigate an autoregressive antibody language model, p-IgGen, and steer its generation. We show that TopK SAEs can reveal biologically meaningful latent features, but high feature concept correlation does not guarantee causal control over generation. In contrast, Ordered SAEs impose an hierarchical structure that reliably identifies steerable features, but at the expense of more complex and less interpretable activation patterns. These findings advance the mechanistic interpretability of domain-specific protein language models and suggest that, while TopK SAEs are sufficient for mapping latent features to concepts, Ordered SAEs are preferable when precise generative steering is required.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
Towards a General Framework for HTN Modeling with LLMs
Puerta-Merino, Israel, Núñez-Molina, Carlos, Mesejo, Pablo, Fernández-Olivares, Juan
The use of Large Language Models (LLMs) for generating Automated Planning (AP) models has been widely explored; however, their application to Hierarchical Planning (HP) is still far from reaching the level of sophistication observed in non-hierarchical architectures. In this work, we try to address this gap. We present two main contributions. First, we propose L2HP, an extension of L2P (a library to LLM-driven PDDL models generation) that support HP model generation and follows a design philosophy of generality and extensibility. Second, we apply our framework to perform experiments where we compare the modeling capabilities of LLMs for AP and HP. On the PlanBench dataset, results show that parsing success is limited but comparable in both settings (around 36\%), while syntactic validity is substantially lower in the hierarchical case (1\% vs. 20\% of instances). These findings underscore the unique challenges HP presents for LLMs, highlighting the need for further research to improve the quality of generated HP models.
- Asia > Middle East > Israel (0.40)
- Europe > Spain > Andalusia > Granada Province > Granada (0.04)
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
Checklist
Do the main claims made in the abstract and introduction accurately reflect the paper's Did you describe the limitations of your work? Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experi-20 Did you include the total amount of compute and the type of resources used (e.g., type If your work uses existing assets, did you cite the creators? Did you mention the license of the assets? Did you include any new assets either in the supplemental material or as a URL? [Y es] Did you discuss whether and how consent was obtained from people whose data you're We thereby state that we bear all responsibility in case of violation of rights, etc., and confirmation of F or what purpose was the dataset created? - For the novel task of data analysis as explained Who created the dataset and on behalf of which entity? - This dataset is created during a Who funded the creation of the dataset? What do the instances that comprise the dataset represent?
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Banking & Finance (0.96)
- Health & Medicine (0.94)
Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders
Goyal, Agam, Rathi, Vedant, Yeh, William, Wang, Yian, Chen, Yuen, Sundaram, Hari
Large language models (LLMs) are now ubiquitous in user-facing applications, yet they still generate undesirable toxic outputs, including profanity, vulgarity, and derogatory remarks. Although numerous detoxification methods exist, most apply broad, surface-level fixes and can therefore easily be circumvented by jailbreak attacks. In this paper we leverage sparse autoencoders (SAEs) to identify toxicity-related directions in the residual stream of models and perform targeted activation steering using the corresponding decoder vectors. We introduce three tiers of steering aggressiveness and evaluate them on GPT-2 Small and Gemma-2-2B, revealing trade-offs between toxicity reduction and language fluency. At stronger steering strengths, these causal interventions surpass competitive baselines in reducing toxicity by up to 20%, though fluency can degrade noticeably on GPT-2 Small depending on the aggressiveness. Crucially, standard NLP benchmark scores upon steering remain stable, indicating that the model's knowledge and general abilities are preserved. We further show that feature-splitting in wider SAEs hampers safety interventions, underscoring the importance of disentangled feature learning. Our findings highlight both the promise and the current limitations of SAE-based causal interventions for LLM detoxification, further suggesting practical guidelines for safer language-model deployment.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (13 more...)
- Health & Medicine (0.67)
- Media > Television (0.40)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Banking & Finance (0.96)
- Health & Medicine (0.94)
- Asia > Japan (0.05)
- North America > United States > Virginia (0.04)
- North America > United States > Hawaii (0.04)
- (5 more...)
- Government (1.00)
- Education (0.93)
- Law (0.69)
VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs
Goel, Raghavv, Agrawal, Sudhanshu, Gagrani, Mukul, Park, Junyoung, Zao, Yifan, Zhang, He, Liu, Tian, Yang, Yiping, Yuan, Xin, Lu, Jiuyan, Lott, Chris, Lee, Mingu
In this paper, we introduce a simple training-free technique to improve the performance of drafter-based speculative decoding (SpD) methods that incorporates language modeling head (LM head) during drafting process. A drafter-based speculative decoding leverages one or more smaller language models, a.k.a. drafters or draft models, to sample a draft sequence or tree consisting of multiple tokens, followed by verification by a base LLM, a target model, accepting a subset as its valid generation. As it is usually considered that the speculative decoding requires one-to-one mapping between vocabularies of the target model and the draft model, it has been natural to share the vocabulary between them, or even share the LM head as in EAGLE or Medusa. We first identify that this draft token sampling scheme inherently contains an unnecessary inference overhead in drafting, especially for some target LLMs with very large vocabularies. Then, we propose a simple technique, VocabTrim, to mitigate the drafting overhead to improve the generation speed in memory-bound environment. VocabTrim reconstructs the drafter LM head to contain only a limited set of tokens, selected by the most frequently sampled from the vocabulary of the target model. While limiting the vocabulary in drafting slightly degrades the acceptance rate, it significantly reduces the drafting latency in memory-bound process which is often the case on edge devices, resulting in higher memory-bound speed up (MBSU). We show that our method can boost the memory-bound speed-up for Llama-3 models on Spec-Bench, specifically by 16% for Llama-3.2-3B-Instruct.
- North America > United States (0.24)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)